Skip to content

OCPBUGS-14877: Validate that number hosts does not exceed replicas#7268

Merged
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
bfournie:check-replicas
Jul 26, 2023
Merged

OCPBUGS-14877: Validate that number hosts does not exceed replicas#7268
openshift-merge-robot merged 1 commit intoopenshift:masterfrom
bfournie:check-replicas

Conversation

@bfournie
Copy link
Copy Markdown
Contributor

@bfournie bfournie commented Jun 21, 2023

A check was added in https://issues.redhat.com//browse/OCPBUGS-10342 to warn when the number of replicas exceeded the configured hosts, however this did not catch the case when the number of configured hosts exceeds the replicas, so it is added here. In addition if too many hosts are defined compared to the replicas it will result in an error instead of a warning, since this invalid configuration will cause installation failures.

@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jun 21, 2023
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@bfournie: This pull request references Jira Issue OCPBUGS-14877, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @mhanss

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

A check was added in https://issues.redhat.com//browse/OCPBUGS-10342 to warn when the number of replicas exceeded the configured hosts, however this did not catch the case when the number of configured hosts exceeds the replicas, so it is added here. In addition this validation will now result in an error instead of a warning since this invalid configuration will cause installation failures.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci Bot requested review from dhellmann, lranjbar and mhanss June 21, 2023 13:22
Copy link
Copy Markdown
Member

@zaneb zaneb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's never an error to not have enough hosts in a role, only too many.
Unit tests are failing which may be related.

Comment thread pkg/asset/agent/manifests/nmstateconfig.go Outdated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The effect of this is that the user has to specify either all or none of the masters. Since each of the masters is theoretically optional, and this is a new check, I definitely don't think this is something we could backport.

For this to be an error, I think instead of the condition numMasters != 0 && numMasters < numRequiredMasters it should be:

numMasters < numRequiredMasters && (numRequiredMasters - numMasters) > ((numRequiredMasters + numRequiredWorkers) - (numMasters + numWorkers))

i.e. the number of masters without a host definition is greater than the number of remaining undefined hosts.

This simplifies to numMasters < numRequiredMasters && numWorkers > numRequiredWorkers. Since this is a strict subset of the check on line 380, there's no need to return an error here at all.

It would still be unusual for users to configure hosts for only some of the masters, so I think we should change this error back to a warning.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, for this to be an error then instead of numWorkers != 0 && numWorkers < numRequiredWorkers) this condition should be something like

numWorkers < numRequiredWorkers && (numRequiredWorkers - numWorkers) > ((numRequiredMasters + numRequiredWorkers) - (numMasters + numWorkers) - (numRequiredMasters - numMasters))

i.e. the number of workers without a host definition is greater than the number of remaining undefined hosts excluding undefined hosts that should be masters. (This assumes that we don't have too many masters defined, which is already causes an error on line 371.)

It turns out that this can never be true, because it simplifies to numWorkers < numRequiredWorkers && (numRequiredWorkers - numWorkers) > (numRequiredWorkers - numWorkers) so there is no time we want to return an error here.

Let's change this one back to a warning also.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this can't be true when numMasters == 0, since numRequiredMasters can't be negative so it doesn't matter, but it's a bit confusing having this inside the if numMasters != 0 block.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about nesting this inside if numWorkers != 0

@bfournie
Copy link
Copy Markdown
Contributor Author

bfournie commented Jun 27, 2023

It's never an error to not have enough hosts in a role, only too many. Unit tests are failing which may be related.

Yeah we can't know if not enough hosts are defined that other hosts aren't being booted, I'll remove the error.

@bfournie
Copy link
Copy Markdown
Contributor Author

/hold

@openshift-ci openshift-ci Bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 27, 2023
@bfournie
Copy link
Copy Markdown
Contributor Author

/unhold

@openshift-ci openshift-ci Bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jun 28, 2023
A check was added in https://issues.redhat.com//browse/OCPBUGS-10342
to warn when the number of replicas exceeded the configured hosts,
however this did not catch the case when the number of configured
hosts exceeds the replicas, so it is added here. In addition this
validation will now result in an error instead of a warning since
this invalid configuration will cause installation failures.
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@bfournie: This pull request references Jira Issue OCPBUGS-14877, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (4.14.0) matches configured target version for branch (4.14.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @mhanss

Details

In response to this:

A check was added in https://issues.redhat.com//browse/OCPBUGS-10342 to warn when the number of replicas exceeded the configured hosts, however this did not catch the case when the number of configured hosts exceeds the replicas, so it is added here. In addition if too many hosts are defined compared to the replicas it will result in an error instead of a warning, since this invalid configuration will cause installation failures.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 29, 2023

@bfournie: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/okd-e2e-aws-ovn-upgrade 71c32a8 link false /test okd-e2e-aws-ovn-upgrade
ci/prow/okd-scos-e2e-aws-ovn 71c32a8 link false /test okd-scos-e2e-aws-ovn
ci/prow/okd-e2e-aws-ovn 71c32a8 link false /test okd-e2e-aws-ovn

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

Copy link
Copy Markdown
Contributor

@rwsu rwsu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jul 25, 2023
Copy link
Copy Markdown
Contributor

@pawanpinjarkar pawanpinjarkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jul 26, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: pawanpinjarkar

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 26, 2023
@openshift-merge-robot openshift-merge-robot merged commit 9b43ffc into openshift:master Jul 26, 2023
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@bfournie: Jira Issue OCPBUGS-14877: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-14877 has been moved to the MODIFIED state.

Details

In response to this:

A check was added in https://issues.redhat.com//browse/OCPBUGS-10342 to warn when the number of replicas exceeded the configured hosts, however this did not catch the case when the number of configured hosts exceeds the replicas, so it is added here. In addition if too many hosts are defined compared to the replicas it will result in an error instead of a warning, since this invalid configuration will cause installation failures.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants